NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Multi-Camera View Recommendation with Temporal and Camera Embedding

https://doi.org/10.1109/IE64880.2025.11130132

Lee, Kuan-Ying; Zhou, Qian; Nahrstedt, Klara (June 2025, IEEE)

Multi-camera systems are essential in movies, live broadcasts, and other media. The selection of the appropriate camera for every moment has a decisive impact on production quality and audience preferences. Learning-based multi-camera view recommendation frameworks have been explored to assist professionals in decision making. This work explores how two standard cinematography practices could be incorporated into the learning pipeline: (1) not staying on the same camera for too long and (2) introducing a scene from a wider shot and gradually progressing to narrower ones. In these regards, we incorporate (1) the duration of the displaying camera and (2) camera identity as temporal and camera embedding in a transformer architecture, thereby implicitly guiding the model to learn the two practices from professional-labeled data. Experiments show that the proposed framework outperforms the baseline by 14.68% in six-way classification accuracy. Ablation studies on different approaches to embedding the temporal and camera information further verify the efficacy of the framework.
more » « less
Free, publicly-accessible full text available June 23, 2026
miVirtualSeat: A Next Generation Hybrid Telepresence System

https://doi.org/10.1145/3746441.3748238

Nahrstedt, Klara; Sitaraman, Ramesh; Chakareski, Jacob; Zink, Michael; Wu, Mingyuan; Wang, Lingdong; Chen, Bo; Ji, Ruifan; Lee, Kuan-Ying; Murray, John; et al (September 2025, ACM)

Free, publicly-accessible full text available September 8, 2026
Pseudo Dataset Generation for Out-of-domain Multi-Camera View Recommendation

https://doi.org/10.1109/VCIP63160.2024.10849905

Lee, Kuan-Ying; Zhou, Qian; Nahrstedt, Klara (December 2024, IEEE)

Multi-camera systems are indispensable in movies, TV shows, and other media. Selecting the appropriate camera at every timestamp has a decisive impact on production quality and audience preferences. Learning-based view recommendation frameworks can assist professionals in decision-making. However, they often struggle outside of their training domains. The scarcity of labeled multi-camera view recommendation datasets exacerbates the issue. Based on the insight that many videos are edited from the original multi-camera videos, we propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets. Promisingly, by training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model’s accuracy in the target domain and bridge the accuracy gap between in-domain and never-before-seen domains.
more » « less
Full Text Available
Do Pre-trained Models Benefit Equally in Continual Learning?

https://doi.org/10.1109/WACV56688.2023.00642

Lee, Kuan-Ying; Zhong, Yuanyi; Wang, Yu-Xiong (January 2023, IEEE CVF Winter Conference on Applications of Computer Vision)

Full Text Available
SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming

https://doi.org/10.1109/ISM.2020.00016

Park, Jounsup; Wu, Mingyuan; Lee, Kuan-Ying; Chen, Bo; Nahrstedt, Klara; Zink, Michael; Sitaraman, Ramesh (December 2020, IEEE International Symposium on Multimedia (ISM))
null (Ed.)
Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth.
more » « less
Full Text Available

Search for: All records